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A dynamical system is controllable if by imposing appropriate external signals on a subset of its nodes, it can 
be driven from any initial state to any desired state in finite time. Here we study the impact of various 
network characteristics on the minimal number of driver nodes required to control a network. We find that 
clustering and modularity have no discernible impact, but the symmetries of the underlying matching 
problem can produce linear, quadratic or no dependence on degree correlation coefficients, depending on 
the nature of the underlying correlations. The results are supported by numerical simulations and help 
narrow the observed gap between the predicted and the observed number of driver nodes in real networks. 
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While during the past decade significant efforts have been devoted to understanding the structure, 
evolution and dynamics of complex networks' only recently has attention turned to an equally 
important problem: our ability to control them. Given the problem's importance, recent work has 
extended the concept of pinning controF"' and structural controUabUity'" '"" to complex networks. Here we focus 
on the latter approach. A networked system is considered controllable if by imposing appropriate external signals 
on a subset of its components, called driver nodes, the system can be driven from any initial state to any final state 
in finite time" As the control of a system requires a quantitative description of the governing dynamical rules, 
progress in this area was limited to small engineered systems. Yet, recently Liu et al.'" showed that the identifica- 
tion of the minimal number of driver nodes required to control a network, N^, can be derived from the network 
topology by mapping controUabUity"" to the maximum matching in directed networks"*. The mapping indicated 
that No is mainly determined by the degree distribution P{ki„, kaut)- We know, however, that a series of char- 
acteristics, from degree correlations""^' to local clustering" and communities^''"^'', cannot be accounted for by 
P(fcin, fcout) alone, prompting us to ask: which network characteristics affect the system's controllability? 

The three most commonly studied deviations from the random network configuration are (i) clustering, 
manifested as a higher clustering coefficient C than expected based on the degree distribution^'; (ii) community 
structure, representing the agglomeration of nodes into distinct communities, captured by the modularity 
parameter Q^^; (iii) degree correlations^". First, we motivate our work by showing that network characteristics 
other than the degree distribution also affect network control. Next, we use numerical simulations to identify the 
network characteristics that affect controllability, finding that only degree correlations have a discernible effect. 
We then analytically derive «d = Ny,/N for random networks with a given degree distribution and correlation 
profile. More detailed calculations are provided in the Supplementary Information Sec. III. Finally, we test our 
predictions on real networks. 

Results 

Prediction based on the degree distribution. To motivate our study we compared the observed to the 
prediction based on the degree sequence for several real networks. For this we randomize each network preserving 
its degree sequence and we calculate Np'^'', the number of driver nodes for the randomized network. Plotting 
versus N^^'^'^ on log-log scale indicates that the degree sequence correctly predicts the order of magnitude of 
despite known correlations"'^" (Fig. la). However, by plotting «d = Nu/N versus ng"'' =Np™''/N we observe 
clear deviations from the degree based prediction (Fig. lb). Our goal is to understand the origin of these 
deviations, and the degree to which network correlations can explain the observed «d. 
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Figure 1 | (a) We compare Nu for real systems to N™'', represeming the number of driver nodes needed to control their randomized counterparts. 
Randomization eliminates all local and global correlations, only preserving the degree sequence of the original system. We find that the degree sequence 
predicts the order of magnitude of No correctly, however, small deviations are hidden by the log scale, needed to show the whole span of No seen in real 
systems, (b) These deviations are more obvious if we compare the density of driver nodes = N^/N and ng"'' in linear scale, finding that for some 
systems (e.g. regulatory and p2p Internet networks) the degree sequence serves as a good predictor of while for other systems (e.g. metabolic networks 
and food webs) deviates from the prediction based solely on the degree sequence. 



Numerical simulations. We start from a directed network with 
Poisson^"'^" or scale-free degree distribution^'''^. The scale-free 
network is generated by the static model described in the Methods 
section. We use simulated annealing to add various network 
characteristics by link rewiring, while leaving the in- and out- 
degrees unchanged, tuning each measure to a desired value, for 
details see the Methods section. We computed «d using the 
Hopcroft-Karp algorithm"". 

Clustering. We use the global clustering coefficient^' defined for 
directed networks as 



c- 



3- number of triangles 
2- number of adjacent edge pairs 



(1) 



The simulations indicate that changes in C only slightly alter «£> and 
that the effect is not systematic (Fig. 2a). Hence we conclude that C 
plays a negligible role in determining 

Modularity. We quantify the community structure using^"''^'': 



, (in) , (out)' 



where Ay„ is the adjacency matrix, and c„ are the communities the 
V and w nodes belong to, respectively. Specifying Q still leaves a great 
amount of freedom in the number and size of the communities. We 
therefore choose to randomly divide the nodes into Nq equally sized 
groups, and increase the edge density within these groups, elevating 
Q to the desired value. 

The simulations indicate that this community structure has no 
effect on (Fig- 2b). While adding communities to networks can 
be achieved in many different ways, and the effect of modularity 
can be explored in more detail (e.g. hierarchical organization of 
communities^'''*''^, overlapping community structure^*"', etc), we 
have failed to detect systematic, modularity induced changes in 
no, prompting us to conclude that Q does not play a leading role 
in nu. 

Degree correlations. In directed networks each node has an in-degree 
(fci) and an out-degree (/Cq), thus we can define four correlation 
coefficients: correlations between the source node's in- and out- 
degree, and the target node's in- and out-degree (Figs. 3, 4)^". We 
use the Pearson coefficient to quantify each correlation wdth a single 
parameter: 
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Figure 2 | Effect of the clustering coefficient Cand modularity Q on the density of driver nodes, no. Network size is N = 10, 000. Each data point is an 
average over 50 independent runs; the error bars, typically smaller than the symbol size, represent the standard deviation of the measurements. 
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Figure 3 | The impact of degree-degree correlations on the density of driver nodes (no) for the Erdos-Renyi model (AT = 10,000) for average degrees (Jc) 
= 1 (red), (k) = 3 (green), (k) = 5 (blue), (k) = 7 (black) and (k) = 9 (orange). The results are similar for the scale-free model (see Fig. 4). Each data point 
is an average of 100 independent runs. 



(3) 



where e ' sums over all edges, a, P e {in, out} is the degree type, 
fc'"' is the degree of the source node, is the degree of the target 

node. And j„ = — is the average degree of the nodes at the 

beginning of each link, ~ ^ e i^^f' — fc'''' ^ is the variance; fcC' 

and o""" are defined similarly. 

Simulations shown in Figs. 3 and 4 indicate that degree correla- 
tions systematically affect «d. We observe three distinct types of 
behavior: 



«D depends monotonically on r*""""', so that low (negative) 
correlations increase and high (positive) correlations lower 
«D (Figs. 3c, 4c); 

Both r*""""' and r*""' increase np, independent of the sign of 
the correlations (Figs. 3a, 3d, 4a, 4d); 
j,(m-out) jj^g effect on «d (Figs. 3c, 4c). 

The behavior is qualitatively the same for Erdos-Renyi (Fig. 3) and 
scale-free (Fig. 4) networks. 

The diversity of these numerical results require a deeper explana- 
tion. Therefore in the remaining of the paper we focus on 



(i) 

(ii) 
(iii) 



understanding analytically the role of degree correlations, which, 
by systematically altering Md, affect the system's controllability. 

Analytical framework. The task of identifying the driver nodes can 
be mapped to the problem of finding a maximum matching of the 
network'". A matching is a subset of links that do not share start or 
end points. We call a node matched if a link in the matching points 
at it and we gain fuU control over a network if we control the 
unmatched nodes. The cavity method has been successfully used 
to calculate the size of the maximum matching for undirected'^ 
and directed'" network ensembles with given degree distribution. 
Here we study network ensembles with a given degree correlation 
profile. 

We calculate analytically for a given Pik^^, /c^m) and selected 
degree-degree correlation e(jin, jou,; k^^, fcout)> representing the prob- 
ability of a directed link pointing from a node with degrees andjout 
to a node with degrees fcj^ and fc^m. In the absence of degree correla- 
tions (neutral case) 

e""aiJo; h,K) =P''")Oi)Q'°""Oo)Q<"'(fci)P'°""('co), (4) 



where Q(™"ao) = |yP'™"Oo), Q("'(M) = and <fc) is 

the average degree. To ensure analytical tractability we chose^' 



2ki 
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Figure 4 | The impact of degree-degree correlations on the density of driver nodes (hd) for the scale-free model (JV = 10, 000, y = 2.5) for average 
degrees (Ic) = 1 (red), (k) = 3 (green), (k) = 5 (blue), (k) = 7 (black) and (k) = 9 (orange). The results are similar for the Erdos-Renyi model (see Fig. 3). 
Each data point is an average of 100 independent runs. 



p(in) (j-j) Q(ta) (fci) + /m-ta) „(m-m) ^ 



(5a) 



e('"-°'")aiJo;/ci,fco) = Q^°""Oo)Q'"'(fc,) 



(5b) 



[q(°"'' Oo) Q<"' ih) + r'""'-'") m(°"'-'"' Oo, fci)] 



J- 

By fixing m'""'''(j, fc) (a, jS e {in, out}) we obtain a one parameter 
network ensemble characterized by r*" where m'°'"'"(j, fc) satisfies 
the constraints 



^ m'"-'**^' = m^^-P^ij, k)=0, 

}=0 k=0 



^;W-«(/, fc) = i, 



(7) 



and all elements of e'" '"(j, fc) are between 0 and 1. 

Our goal is to understand the relation between and the degree 
correlation coefficient r'" Assuming that r'" is small we treat the 
correlations as perturbations to the neutral case, discussing the 
impact of the four r'" correlations separately. 



(5(-) Out-in correlations. Using equation (5c) and keeping the first non- 



zero correction we obtain (Supplementary Information Sec. III.) 



=75;^(»)->"'-in)^ [Ml (W2, 1 - Wi) +Mi ( 1 - Wi , W2)],(8) 



(5d) where ^d'"' is the fraction of driver nodes of the uncorrected net- 



work; Wi and Wj only depend on P(fcin, fcout)'°i and 



Equation (8) predicts that «d depends linearly on r'""' '"', a prediction 
(6) supported by simulations for small r'°"' "' (Figs. 3c and 4c). This 
behavior is also revealed by the equivalent problem of finding the 



SCIENTIFIC REPORTS | 3 : 1067 | DOI: 10.1038/srep01067 



4 




Figure 5 | One-step out-out correlations induce positive two-step 
correlation. Positive (negative) correlation between neighboring nodes 
means that if node A has high out-degree, then node B is likely to have high 
(low) out-degree, and hence C wiU likely have high out-degree. 

maximum matching of graphs'". For a node A with out-degree kg, by 
definition only one edge can be in the matching. If the remainder kg - 
1 edges point to nodes with degree 1 (disassortative case), A inhibits 
them from being matched, so we have to control each of them indi- 
vidually, increasing n^. If the remainder fco - 1 edges point to hubs 
(assortative case), these hubs are likely to be matched through 
another incoming edge, decreasing np. 

Out-out correlations. The cavity method indicates that for out-out 
correlations the first nonzero correction is of order (r''"" °''")^: 



——(out-out) :r:i— (0) 



(out-out)' W 



H' 



(m) 



■(1-Wi)M2(w2)+H(°"'>'(w2)M2(1-Wi)], 



(10) 



where (x) = Til ^ {in, out)) only depends on 
P{kin, fcout) and 



M,{x)= J2 - 

j,k = lj = 0 



(out-out) 



(/,;■) m 



(out— out) 



p(°"')(/) 



(11) 



Equation (10) predicts that fj5^(°"*^°"') does not depend on the out- 
out correlation of the directly connected nodes, but only on the 
correlation between the second neighbors, hence its dependence is 
quadratic in r'""' a prediction supported by numerical simula- 
tions (Figs. 3d and 4d). Indeed, positive (negative) correlation 
between the immediate neighbors means that if node A has high out- 
degree, then node B is expected to have high (low) out-degree, and 
therefore C is likely to have high out-degree (Fig. 5). That is, both 
positive and negative one-step out-out correlations induce positive 



two-step correlations, accounting for the symmetry of the effect 
observed in simulations (Figs. 3d and 4d). 

In-in correlations. Switching the direction of each link does not 
change the matching, but turns out-out correlations into in-in cor- 
relations. So ni)"^" can be obtained by exchanging P*'"'(fcin) and 
P*°''"(A:out) in equation (10), predicting again a quadratic dependence 
on r'" '"', supported by the numerical simulations (Figs. 3a and 4a). 

In-out correlations. The equations for no do not depend on the in- 
degree of the source and the out-degree of the target of a link, hence 
we predict that r''" does not play a role in network controllability, a 
prediction supported by the simulations (see Figs. 3b and 4b). 

Taken together, we predict that the functional dependence of Hd 
on degree correlations defines three classes of behaviors, depending 
on the matching problem's underlying symmetries: «d has no 
dependence on r*'" linear dependence on r'""' and quadratic 
dependence on r*'""'"' and These predictions are fully sup- 

ported by numerical simulations (Figs. 3 and 4): for small r we see no 
dependence on r''" an asymmetric, monotonic dependence on 
^(out-in)^ and a symmetric on r"" "' and r<~"""". 

To directly compare the analytical predictions to simulations we 
need to know the complete e(/i, jol ka) distribution, which is not 
explicitly set in our simulations. So to test the results we use a rewir- 
ing method that sets the e(ji, ]„; fcj, fco) distribution, not only the r 
correlation coefficient^'. This method is not as robust as our original 
algorithm and the range of accessible r values is more restricted. 
However, since our results are based on perturbation scheme we only 
expect them to be correct for small r values. Indeed, we find that the 
predictions quantitatively reproduce the numerical results in a fair 
interval of r^"'"'' (Fig. 6). 

Real networks. We test the predictions provided by the developed 
analytical and numerical tools on a set of publicly available network 
datasets. When complex systems are mapped to networks, the links 
connecting the nodes represent interactions between them. In this 
context self-loops represent self-interactions, with a strong, well 
understood impact on controllability'"'"'. While in some systems 
self-loops are obviously present (e.g. neural networks), in others 
they are manifestly absent (e.g. electric circuits'"). Our purpose 
here is to test the effect of correlations, hence we rely on datasets 
that capture the wiring diagram of various complex systems with 
different correlation properties. Therefore, even if in a few of these 
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Figure 6 | The analytic formulas are tested with simulations on an (a) Erdos-Renyi model and on a (b) scale-free model. We used the algorithm 
proposed in'' to set e'""'"(ji, fc;, k„). For (a) network we choose N = 1, 000 and (k) = 3; for (b) N = 1, 000, )' = 2.5 and (fc) = 4. Each data point is an 
average over 100 independent runs; the errors represent by the standard deviation of the measurements. 
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Figure 7 | The observed and predicted deviation between and N^^. Red line: A = (Nd — Nq"'') / N, the prediction error based on the degree 
sequence. Dashed Unes: correlations relevant to controllability. For each network A is calculated by averaging over 50 independent configurations. 



maps self-loops are missing, it is beyond the scope of this work to 
complete these networks. However, when studying controllability of 
a particular system, careful thought has to be put into whether self- 
loops are present or not. We present a systematic study on the effect 
of self-loops in the Supplementary Information Sec. II.B. 

To test the impact of our predictions on real networks we calculate 

A=^--^S"\ (12) 

where Np"'' represent the number of driver nodes for the degree- 
preserved randomized version of the original network. Hence if A = 
0 then P{ki„, fcout) accurately determines Nd; if A 0 then the 
structural properties not captured by the degree sequence influence 
its controllability. We measure the correlations in several real net- 
works and based on our numerical and analytical results we predict 
the sign of A (Fig. 7). We grouped the networks according to our 
predictions. We provide the details of each network dataset in the 
Supplementary Information Table SI. 

Group A. The networks of p2p Internet (Gnutella filesharing clients) 
do not have strong correlations, therefore we expect to be cor- 
rectly approximated by the prediction based on P(fcin, fcout) (i-e. A = 
0), in line with the empirical observations. 

Group B. As in most networks the three relevant correlations coexist 
to some degree (Fig. 7), it is impossible to isolate their individual role. 
Yet, the networks in this group (electric circuits, metabolic networks, 
neural networks, power grids and food webs with exception of the 
Seagrass network) all have negative out-in and nonzero in-in and 
out-out correlations, each of which individually increase Hp as we 
showed above. Therefore we predict A > 0, in line with the empirical 
observations. 

Group C. Only the prison social-trust and the cell phone network 
feature significant positive out-in correlations. These networks also 
display nonzero in-in and out-out correlation, leading to the coex- 
istence of two competing effects: out-in correlations decrease rij, and 
the out-out and in-in correlations increase «£,. Since the out-in cor- 
relation is a first order effect (equation (8)), while out-out and in-in 
correlations are only of second order (equation (10)), we expect a 
decrease in (i.e. A < 0), consistent with the empirical results. 

Group D. The Seagrass food web and citation networks do not feature 
significant out-in correlations, only the secondary in-in and out-out 



correlations, hence we expect to increase (A > 0), consistent with 
the observations. 

Group E. Only the transcriptional regulatory networks are somewhat 
puzzling in that they show degree correlations, yet the degree 
sequence stiU correctly gives «d. However, the simulations indicated 
that the effect of correlations is negligible for high «£>. And our 
analytical results showed that the value of the correction depends 
on details of e{j^,ja, h, k^), not captured by the Pearson coefficient r. 
These observations highlight that even though in most cases our 
qualitative predictions based on r are valid, in some cases further 
investigation is required. 

Discussion 

The goal of our paper was to clarify the higher order network char- 
acteristics that influence controllability. We studied the effect of 
three topological characteristics: clustering, modularity and degree 
correlations. We used numerical simulations to identify the role of 
the relevant characteristics, finding that changes in the clustering 
coefficient and the community structure have no systematic effect 
on the the minimum number of driver nodes rijj. In contrast degree 
correlations showed a robust effect, whose magnitude and direction 
depends on the type of correlation. Using the cavity method we 
derived for networks with given degree distribution and correla- 
tion profiles, finding results that are consistent with our numerical 
simulations. For real networks these numerical and analytic results 
enabled us to qualitatively explain the deviation of the observed rij^ 
from the prediction based only on P(A:i„, k^^^). 

Our results not only offer a new perspective on the role of topo- 
logical properties on network controllability, but also raise several 
questions. Future research directions include determining the 
optimal network structure to minimize the number of necessary 
driver nodes, and studying how different network characteristics 
influence the robustness of the control configuration. 

Methods 

Generating a scale-free network. We use tlie static model to generate directed scale- 
free networks'*". We start from N disconnected nodes and assign a weight Wj — (i + 
ioY^' to each node / — 1 ...N). We randomly select two nodes i and] with probability 
proportional to Wf and w, respectively and if they are yet not connected, we connect 
them. We allow self-loops, but avoid multi-edges. We repeat the process until L links 
have been placed. The resulting network has average degree (/c) = 2L/]V, and p(i^o"t)^^j 

— k~'^ for large k, where — 1 -j — , and maximum degree /Cmax ^ i^^- 
c. 

To systematically study correlations, the starting network has to be uncorrelated. 
However, the presence of hubs may induce unwanted degree correlations'*\ and may 
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also considerably limit the maximum and minimum correlations accessible via 
rewiring^. We overcome these difficulties by introducing a structural cutoff in the 
degrees, choosing to ensure /c^nax {{k}N)^'^ Note, that m the static model of Goh 
et al. io = 0*". 

As both in- and out-degree of node i is proportional to w„ the above procedure 
results in correlations between the in- and out-degrees of node /. To eliminate the 
correlations, we randomize the in-degree sequence while keeping the out-degree 
sequence unchanged. 

Rewiring algorithm. We use degree preserving rewiring^" to add each network 
characteristic. Suppose that the chosen network characteristic is quantified by a 
metric X. To set its value to X*, we defme the E{X) — |X — X* | energy, so £(X*) is a 
global minimum. We minimize this energy by simulated annealing^: ( 1 ) choose two 
links at random with uniform probability; (2) rewire the two links and calculate the 
energy £(X) of the resulted network; (3) accept the new configuration with probability 
r 1, ifA£<0 
P^[e-^^. ifA£>0, ^''^ 

where the /5 parameter is the inverse temperature; (4) repeat from step one and 
gradually increase (3. Stop if |£{X) — £(X*)| is smaller than a predefined value. 

Note, that keeping the degree sequence bounds the possible values of X that can be 
reached by rewiring. In all cases we study the full interval of accessible X values. 
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